Compiling a Corpus of Taiwanese Students' Spoken English
نویسنده
چکیده
This paper reports the compilation of a corpus of Taiwanese students’ spoken English, which is one of the twenty subcorpora of the Louvain International Database of Spoken English Interlanguage (LINDSEI) (Gilquin et al., 2010). LINDSEI is one of the largest corpora of learner speech. The compilation process follows the design criteria of LINDSEI so as to ensure comparability across sub-corpora. The participants, procedures for data collection and process of transcription are all recorded. Sixty thirdor fourth-year English majors in Taiwan are interviewed and recorded in English. Each interview is accompanied by a profile which contains information about such learner variables as age, gender, mother tongue, country, English learning context, knowledge of other foreign languages, amount of time spent in English-speaking countries and such interviewer variables as gender, mother tongue, knowledge of foreign languages and degree of familiarity with the interviewees. Another variable, the learners’ English proficiency level based on the results of international standardised tests is collected; this is not available in other sub-corpora of LINDSEI. The participants’ proficiency is similarly distributed across B1 to C1 levels in the Common European Framework of Reference. This paper concludes with a discussion of the contributions and research potential of the corpus.
منابع مشابه
Vague Language and Interpersonal Communication: An Analysis of Adolescent Intercultural Conversation
This paper is concerned with the analysis of the spoken language of teenagers, taken from a newly developed specialised corpus the British and Taiwanese Teenage Intercultural Communication Corpus (BATTICC). More specifically, the study employs a discourse analytical approach to examine vague language in an intercultural context among a group of British and Taiwanese adolescents, paying particul...
متن کاملCompiling Taiwanese Learner Corpus of English
This paper presents the mechanisms of and criteria for compiling a new learner corpus of English, the quantitative characteristics of the corpus and a practical example of its pedagogical application. The Taiwanese Learner Corpus of English (TLCE), probably the largest annotated learner corpus of English in Taiwan so far, contains 2105 pieces of English writing (around 730,000 words) from Taiwa...
متن کاملError Analysis of Taiwanese University Students’ English Essay Writing: A Longitudinal Corpus Study
Writing is considered one of the most difficult skills in EFL/ESL. Thus, meticulous recognition and classification of students’ errors in certain contexts is a worthwhile endeavor which provides us with both diagnostic and prognostic power. Accordingly, a total of 430 students in 15 English writing classes held during 12 consecutive semesters in a private university in central Taiwan were the s...
متن کاملIntroduction: Compiling and analysing the Spoken British National Corpus 2014
For over twenty years, the British National Corpus has been one of the most widely known and used corpora. It is almost impossible to attend an international corpus linguistics conference such as Corpus Linguistics, ICAME (International Computer Archive of Modern and Medieval English), AACL (American Association for Corpus Linguistics) or APCLC (Asia Pacific Corpus Linguistics Conference) witho...
متن کاملSampling situated discourse for spoken Chinese corpus
Corpus sampling and representativeness constitutes the first and very important issue in corpus compilation. This paper first makes a brief assessment of three corpora of spoken English with regard to corpus sampling and representativeness. It then describes the way the problem was dealt with in compiling a spoken Chinese corpus of situated discourse.
متن کامل